Augmented reality conferencing system and method

ABSTRACT

The disclosure provides an augmented reality virtual videoconference for each of a plurality of computing devices during a networked communication session. The networked communication session is defined and provided a plurality of devices. Video content that is at least partially captured by a camera associated with a respective device is received, and a composited interactive audio/video feed comprised of audio/video input received during the networked communication session from each of the first user computing device and at least the respective user computing device of the one of the additional users is generated. At least some of the video content captured by the camera associated with a respective user computing device is removed prior to including the remaining video content in the composited interactive audio/video feed. The composited interactive audio/video feed is provided to the plurality of computing devices during the networked communication session.

CROSS-REFERENCE TO RELATED PATENT APPLICATIONS

This application is based on and claims priority to U.S. ProvisionalPatent Application Ser. No. 62/819,501, filed Mar. 15, 2019, which isincorporated by reference in its entirety as if expressly set forthherein. Further, this application is based on and claims priority to:U.S. Provisional Patent Application Ser. No. 62/832,751, filed Apr. 11,2019; U.S. Provisional Patent Application Ser. No. 62/833,396, filedApr. 12, 2019; and U.S. Provisional Patent Application Ser. No.62/858,143, filed Jun. 6, 2019, each of which is incorporated byreference in its respective entirety as if expressly set forth herein.Further, this application incorporates by reference U.S. patentapplication Ser. No. 16/537,201, filed on Aug. 9, 2019 in its respectiveentirety as if expressly set forth herein.

FIELD

The present invention relates to networking, including in connectionwith one or more implementations for improving collaboration.

BACKGROUND

Despite the connectivity provided by the plethora of computing devices,communication devices, and networks, there remains a divide between thepeople using such devices and their actual presence.

Increasingly in the digital era, presence remains a technologicalchallenge. People are together, but they are not together. This iscaused at least in part by the limitations of the technology that peopleuse. For example, during video conferences users' faces are often facingdownward while they are looking at phones, which contributes to feelingsof isolation and disconnect. People meeting virtually with others usingcontemporary remote access solutions do not always share a sense ofpresence, but due to limitations in technology.

BRIEF SUMMARY

The disclosure provides an augmented reality virtual videoconference foreach of a plurality of computing devices during a networkedcommunication session. The networked communication session is definedand provided a plurality of devices. Video content that is at leastpartially captured by a camera associated with a respective device isreceived, and a composited interactive audio/video feed comprised ofaudio/video input received during the networked communication sessionfrom each of the first user computing device and at least the respectiveuser computing device of the one of the additional users is generated.At least some of the video content captured by the camera associatedwith a respective user computing device is removed prior to includingthe remaining video content in the composited interactive audio/videofeed. The composited interactive audio/video feed is provided to theplurality of computing devices during the networked communicationsession.

In one or more implementations, the disclosure provides receiving, bythe at least one processor, initialization information from at least oneuser computing device, wherein the initialization information includesvideo content captured by a camera associated with the at least one usercomputing device.

In one or more implementations, the disclosure provides processing, bythe at least one processor, the initialization information to detectobjects and corresponding information associated with the detectedobjects.

In one or more implementations, the detected objects and correspondinginformation include at least one plane of the object.

In one or more implementations, the disclosure provides using, by the atleast one processor, machine learning to process the initializationinformation.

In one or more implementations, the machine learning can be implementedfor at least image processing.

In one or more implementations, the disclosure provides providing, bythe at least one processor, an augmented reality view of the remainingvideo content in the composited interactive audio/video feed as afunction of movement of a viewer of the composited interactiveaudio/video feed.

In one or more implementations, the augmented reality view includesadjusting for skew and angle of view.

In one or more implementations, at least one of the additional usercomputing device(s) communicate on the networked communication sessionvia one or more of Real Time Streaming Protocol, Web Real-TimeCommunication and/or hypertext transport protocol live streaming.

In one or more implementations, the disclosure provides receiving, bythe at least one processor, from the additional user computing devicesinformation representing an interaction by the one of the user computingdevices; and providing a representation of the interaction to each otheruser computing device.

Other features of the present disclosure are shown and described herein.

BRIEF DESCRIPTION OF THE DRAWINGS

Aspects of the present disclosure will be more readily appreciated uponreview of the detailed description of its various embodiments, describedbelow, when taken in conjunction with the accompanying drawings, ofwhich:

FIG. 1 is a diagram illustrating an example hardware arrangement thatoperates for providing the systems and methods disclosed herein;

FIG. 2 is a block diagram that illustrates functional elements of acomputing device in accordance with an embodiment;

FIG. 3 is an example high-level diagram that illustrates interactivitybetween various ones of the devices illustrated in FIG. 1;

FIGS. 4-19D illustrate implementations of the present application; and

FIG. 20 is a flowchart identifying steps associated with animplementation of the present disclosure.

DETAILED DESCRIPTION

By way of summary and introduction, the present disclosure includes aplurality of technological features, vis-à-vis user computing devicesthat are specially configured with hardware and software modules. Duringan interactive, on-line video conference, the devices operate andinteract in physical and/or virtual reality environments, and includeaugmented reality. For example, one or more computer-generated imagescan be superimposed on live video content and can affect at least onedevice's view of the real world. In this way, a composited interactiveexperience is provided, including by supplementing live video contentwith audio/visual content for users to experience a real-worldenvironment that is augmented by computer-generated content. The presentdisclosure solves technological shortcomings associated with userengagement and presence.

In one or more implementations of the present disclosure, numerous formsof deep learning for computing device(s) are supported, based on variousdata representations that are or have been used for training. Forexample, deep learning architectures, such as deep neural networks, deepbelief networks, and recurrent neural networks, can be applied tocomputer vision, speech recognition, natural language processing, audiorecognition, social network filtering, machine translation, imageanalysis, and inspection of physical elements to provide augmentedreality in an interactive video conferencing and other environment. Deeplearning processes can be implemented to extract features of arespective physical environment, such as to detect aspects thereof, suchas tables, televisions, whiteboards, computers, and other variousobjects. The detected elements can be extracted and transformed,including to provide a composite representation of the physicalenvironment therewith.

In one or more implementations, the present disclosure provides forinteractive remote video conferencing that is enhanced by augmentedreality and/or virtual reality. A virtual interactive platform ispresented that creates the same sensory illusion for a person who is notphysically present somewhere, such as in a conference room, with someonewho is physically sitting in that location. The present disclosure canuse one or more deep learning capabilities during an on-line videoconference to process audio-visual content from a device operated by theremotely located person substantially in real-time to extract the personfrom the background. Thereafter, such as by using a collaboration stack,e.g., the TEAMTIME collaboration stack, the remotely located person canbe rendered to appear placed in the same room, and around the sameconference table, with other person(s) who are physically located in theroom. This provides a realistic impression as though the people,including the physically located person(s) and one or more remotelylocated person(s), are all physically meeting together. Thisaccomplishes a sense of presence that is unprecedented and allows peopleto feel like they truly meet each other and collaborate.

In one or more implementations, augmented reality headgear can be wornby users to experience an augmented reality video conference session ofthe present disclosure, including to experience audio, visual, and othercontent provided via augmented reality. Alternatively (or in addition),one or more audio/visual projection systems can be used for providingvirtual representations of an on-line video conference session thatappears as though remotely located person(s) are in the same location,e.g., around a conference table, with person(s) who are physically atthe location.

Using deep learning and other machine-learning, artificial intelligence,and/or other techniques, various real-world objects such as tables,whiteboards, televisions, and computers can be measured and used by thepresent disclosure for users to see, hear, virtually navigate, interactwith, manipulate, and otherwise use. For example, as a number of peoplephysically sit at a conference room table, others who are locatedremotely and captured in video, can virtually sit with them.

In operation, one or more masks, as known in the art, can be applied toimages of person(s) who are located remotely from a physical location,and the images can be processed such that background and other contentis removed and the individuals are virtually extracted therefrom.Thereafter, using measurements, the extracted content (e.g., the people)can be placed in images of the physical environment, such as in chairs,standing around whiteboards, or the like. One or more processors can beconfigured by executing instructions on processor-readable media, suchas non-transitory processor-readable media, to carry out features andfunctionality shown and described herein. For example, images can beprocessed to detect various objects, including planes of such objectsthat have been determined by deep learning or other techniques.Alternatively, or in addition, a graphical user interface can includegraphical controls for users to identify and/or select respectiveobjects in a room and define aspects associated with the objects. Forexample, selection tools can be used to outline or otherwise definespecific objects, and information representing such objects can begenerated for use in providing the teachings herein.

The systems and methods of the present disclosure can include a form ofinitialization, whereby a location such as a room is photographed by animaging device, such as a camera, associated with the user computingdevice. For example, a user points a camera associated with the usercomputing device at a location and/or around a room prior to or duringan online video conference. One or more software modules executed by oneor more processors cause the processor(s) to process video content as afunction of the initialization, and to detect specific objects andcorresponding information, such as a table and where the table islocated, the plane of the table is, where chairs are situated, and whichchairs are occupied, or the like. Thereafter, virtual representations ofremotely located people can be placed in the scene, such as at the tablein the room in places that are not occupied by people physically locatedin the room (or already located remotely and placed in the roomvirtually). The virtual representations of people (e.g., digitalavatars) can be placed in respective locations where they can remain ina persistent basis throughout the on-line virtual videoconferencesession.

Moreover, the present disclosure supports semantic segmentation,enabling a direct connection of pixel(s) in an image to one or more of aparticular category, i.e., a chair, human, desk, table, whiteboard,screen display, or the like. In one or more implementations, boundingboxes or specific selections of pixels are used to link such classes ofrepresented images. Thereafter, such objects can be masked and/orreplaced with augmented reality, including to replace displays, humans,or virtually anything with a virtual representation of something orsomeone.

In one or more implementations, physical objects, such as whiteboards,can be automatically detected and an interactive collaborativewhiteboard can be virtually overlaid in its place. Alternatively (or inthe same implementation, users can place a virtual surface showing awhiteboard or to share screen in 3-D representation of space of a room.

Thus, one or more scans of a room can result in detection of physicalarticles and people, such as a whiteboard, and such scans can be used todefine and implement a further augmented layer. Additionally, one or thedisplay screens in the room can be detected and virtually replaced by ashared screen component from any participant's computing device in themeeting. As a user writes or draws on a virtual (e.g., digital)collaborative whiteboard, the respective strokes automatically appear onthe virtually represented physical whiteboard when viewed via a devicecapable of providing an augmented reality view, such as by a userwearing a virtual reality headset or glasses, or when provided on adisplay screen. Furthermore, when a person who is physically locatedwith the physical whiteboard physically writes or draws on thewhiteboard, such as with a dry erase marker, a view of the whiteboardcan be captured, such as by a camera. Thereafter, software executing byone or more processor(s) configures the processor(s) to update thevirtually represented whiteboard such that the new writings/drawings canbe displayed, such as by and registered with the digital board.

Thus, the present disclosure can process video content to alter thevideo to include configured (e.g., sized and manipulated)representations of people for placement in particular places, such as inchairs. Individuals can appear to be placed in physical environments asa function of augmented reality, and/or placed in virtual locations as afunction of virtual reality. The result includes improved collaborationon a scale and format that was heretofore not possible.

Referring to FIG. 1, a diagram is provided that shows an examplehardware arrangement that operates for providing the systems and methodsdisclosed herein, and designated generally as system 100. System 100 caninclude one or more data processing apparatuses 102 that are at leastcommunicatively coupled to one or more user computing devices 104 acrosscommunication network 106. Data processing apparatuses 102 and usercomputing devices 104 can include, for example, mobile computing devicessuch as tablet computing devices, smartphones, personal digitalassistants or the like, as well as laptop computers and/or desktopcomputers. Further, one computing device may be configured as a dataprocessing apparatus 102 and a user computing device 104, depending uponoperations be executed at a particular time. In addition, anaudio/visual capture device 105 is depicted in FIG. 1, which can beconfigured with one or more cameras (e.g., front-facing and rear-facingcameras), a microphone, a microprocessor, and a communications module(s)and that is coupled to data processing apparatus 102. The audio/visualcapture device 105 can be configured to interface with one or more dataprocessing apparatuses 102 for producing high quality and interactivemultimedia content, and supporting interactive video conferencing.

With continued reference to FIG. 1, data processing apparatus 102 can beconfigured to access one or more databases for the present disclosure,including image files, video content, documents, audio/video recordings,metadata and other information. However, it is contemplated that dataprocessing apparatus 102 can access any required databases viacommunication network 106 or any other communication network to whichdata processing apparatus 102 has access. Data processing apparatus 102can communicate with devices comprising databases using any knowncommunication method, including a direct serial, parallel, universalserial bus (“USB”) interface, or via a local or wide area network.

User computing devices 104 communicate with data processing apparatuses102 using data connections 108, which are respectively coupled tocommunication network 106. Communication network 106 can be anycommunication network, but is typically the Internet or some otherglobal computer network. Data connections 108 can be any knownarrangement for accessing communication network 106, such as the publicinternet, private Internet (e.g. VPN), dedicated Internet connection, ordial-up serial line interface protocol/point-to-point protocol(SLIPP/PPP), integrated services digital network (ISDN), dedicatedleased-line service, broadband (cable) access, frame relay, digitalsubscriber line (DSL), asynchronous transfer mode (ATM) or other accesstechniques.

User computing devices 104 preferably have the ability to send andreceive data across communication network 106, and are equipped with webbrowsers, software applications, or other means, to provide receiveddata on display devices incorporated therewith. By way of example, usercomputing device 104 may be personal computers such as IntelPentium-class and Intel Core-class computers or Apple Macintoshcomputers, tablets, smartphones, but are not limited to such computers.Other computing devices which can communicate over a global computernetwork such as palmtop computers, personal digital assistants (PDAs)and mass-marketed Internet access devices such as WebTV can be used. Inaddition, the hardware arrangement of the present invention is notlimited to devices that are physically wired to communication network106, and that wireless communication can be provided between wirelessdevices and data processing apparatuses 102. In addition, system 100 caninclude Internet media extender 110 that is communicatively coupled totelevision 112, such as via a high-definition multimedia interface(“HDMI”) or other connection.

According to an embodiment of the present disclosure, user computingdevice 104 provides user access to data processing apparatus 102 for thepurpose of receiving and providing information. The specificfunctionality provided by system 100, and in particular data processingapparatuses 102, is described in detail below.

System 100 preferably includes software that provides functionalitydescribed in greater detail herein, and preferably resides on one ormore data processing apparatuses 102 and/or user computing devices 104.One of the functions performed by data processing apparatus 102 is thatof operating as a web server and/or a web site host. Data processingapparatuses 102 typically communicate with communication network 106across a permanent i.e., un-switched data connection 108. Permanentconnectivity ensures that access to data processing apparatuses 102 isalways available.

FIG. 2 illustrates, in block diagram form, an exemplary data processingapparatus 102 and/or user computing device 104 that can provide variousfunctionality, as shown and described herein. Although not expresslyindicated, one or more features shown and described with reference withFIG. 2 can be included with or in the audio/visual capture device 105,as well. Data processing apparatus 102 and/or user computing device 104may include one or more microprocessors 205 and connected systemcomponents (e.g., multiple connected chips) or the data processingapparatus 102 and/or user computing device 104 may be a system on achip.

The data processing apparatus 102 and/or user computing device 104includes memory 210 which is coupled to the microprocessor(s) 205. Thememory 210 may be used for storing data, metadata, and programs forexecution by the microprocessor(s) 205. The memory 210 may include oneor more of volatile and non-volatile memories, such as Random AccessMemory (“RAM”), Read Only Memory (“ROM”), Flash, Phase Change Memory(“PCM”), or other type. The data processing apparatus 102 and/or usercomputing device 104 also includes an audio input/output subsystem 215which may include one or more microphones and/or speakers.

A display controller and display device 220 provides a visual userinterface for the user; this user interface may include a graphical userinterface which, for example, is similar to that shown on a Macintoshcomputer when running Mac OS operating system software or an iPad,iPhone, or similar device when running iOS operating system software.

The data processing apparatus 102 and/or user computing device 104 alsoincludes one or more wireless transceivers 230, such as an IEEE 802.11transceiver, an infrared transceiver, a Bluetooth transceiver, awireless cellular telephony transceiver (e.g., 1G, 2G, 3G, 4G), oranother wireless protocol to connect the data processing system 100 withanother device, external component, or a network. In addition,Gyroscope/Accelerometer 235 can be provided.

It will be appreciated that one or more buses, may be used tointerconnect the various modules in the block diagram shown in FIG. 2.

The data processing apparatus 102 and/or user computing device 104 maybe a personal computer, tablet-style device, such as an iPad, a personaldigital assistant (PDA), a cellular telephone with PDA-likefunctionality, such as an iPhone, a Wi-Fi based telephone, a handheldcomputer which includes a cellular telephone, a media player, such as aniPod, an entertainment system, such as a iPod touch, or devices whichcombine aspects or functions of these devices, such as a media playercombined with a PDA and a cellular telephone in one device. In otherembodiments, the data processing apparatus 102 and/or user computingdevice 104 may be a network computer or an embedded processing apparatuswithin another device or consumer electronic product.

The data processing apparatus 102 and/or user computing device 104 alsoincludes one or more input or output (“I/O”) devices and interfaces 225which are provided to allow a user to provide input to, to receiveoutput from, and/or to transfer data to and from the system. These I/Odevices may include a mouse, keypad or a keyboard, a touch panel or amulti-touch input panel, camera, network interface, modem, other knownI/O devices or a combination of such I/O devices. The touch input panelmay be a single touch input panel which is activated with a stylus or afinger or a multi-touch input panel which is activated by one finger ora stylus or multiple fingers, and the panel is capable of distinguishingbetween one or two or three or more touches and is capable of providinginputs derived from those touches to the data processing apparatus 102and/or user computing device 104. The I/O devices and interfaces 225 mayinclude a connector for a dock or a connector for a USB interface,FireWire, etc. to connect the system 100 with another device, externalcomponent, or a network. Moreover, the I/O devices and interfaces caninclude gyroscope and/or accelerometer 227, which can be configured todetect 3-axis angular acceleration around the X, Y and Z axes, enablingprecise calculation, for example, of yaw, pitch, and roll. The gyroscopeand/or accelerometer 227 can be configured as a sensor that detectsacceleration, shake, vibration shock, or fall of a device 102/104, forexample, by detecting linear acceleration along one of three axes (X, Yand Z). The gyroscope can work in conjunction with the accelerometer, toprovide detailed and precise information about the device's axialmovement in space. More particularly, the 3 axes of the gyroscopecombined with the 3 axes of the accelerometer enable the device torecognize approximately how far, fast, and in which direction it hasmoved to generate telemetry information associated therewith, and thatis processed to generate coordinated presentations, such as shown anddescribed herein.

Additional components, not shown, can also be part of the dataprocessing apparatus 102 and/or user computing device 104, and, incertain embodiments, fewer components than that shown in FIG. 2 may alsobe used in data processing apparatus 102 and/or user computing device104. It will be apparent from this description that aspects of theinventions may be embodied, at least in part, in software. That is, thecomputer-implemented methods may be carried out in a computer system orother data processing system in response to its processor or processingsystem executing sequences of instructions contained in a memory, suchas memory 210 or other machine-readable storage medium. The software mayfurther be transmitted or received over a network (not shown) via anetwork interface device 225. In various embodiments, hardwiredcircuitry may be used in combination with the software instructions toimplement the present embodiments. Thus, the techniques are not limitedto any specific combination of hardware circuitry and software, or toany particular source for the instructions executed by the dataprocessing apparatus 102 and/or user computing device 104.

In one or more implementations, the present disclosure provides improvedprocessing techniques to prevent packet loss, to improve handlinginterruptions in communications, to reduce or eliminate latency andother issues associated with wireless technology. For example, in one ormore implementations Real Time Streaming Protocol (RTSP) can beimplemented, for example, for sharing output associated with a camera,microphone and/or other output devices configured with a computingdevice. RTSP is an effective (though not necessary in allimplementations) network control protocol for entertainment andcommunications systems, including in connection with streaming output.RTSP is used in the present disclosure, at least in part, forestablishing and controlling media sessions between various end points,including user computing devise 104, Internet media extender 110 anddata processing apparatus 102.

In addition to RTSP, one or more implementations of the presentdisclosure can be configured to use Web Real-Time Communication(“WebRTC”) to support browser-to-browser applications, including inconnection with voice, video chat, and peer-to-peer (“P2P”) filesharing. Thus, the present disclosure avoids a need for either internalor external plugins to connect endpoints, including for voice/video orother communication sharing. In one or more implementations, the presentdisclosure implements WebRTC for applications and/or Internet web sitesto capture and/or stream audio and/or video media, as well as toexchange data between browsers without requiring an intermediary. Theset of standards that comprises WebRTC makes it possible to share dataand perform teleconferencing peer-to-peer, without requiring that theuser install plug-ins or any other third-party software. WebRTC includesseveral interrelated APIs and protocols which work together.

In one or more implementations, at least one of the Internet mediaextender components 110 includes APPLE TV. After an Internet mediaextender 110 is installed (e.g., connected to a television set andconnected to a Wi-Fi, Ethernet or other local area network), a softwareapplication is installed on the Internet media extender 110, as well asat least one mobile computing device 104. For example, a user downloadsand installs an app to an Internet media extender 110 (“TV APP”) andalso installs an app to a user computing device 104 (“MOBILE APP”). Onceinstalled, and the first time the TV APP is executed, the user isprompted to launch the MOBILE APP. Thereafter, the mobile computingdevice 104 (e.g., an iPhone) is automatically detected by the TV APP.During subsequent uses, video content that is provided as a functionaudio/video output from the computing device (e.g., iPhone) is providedinstantly on the television that is connected to the Internet mediaextender 110. In operation, audio/video feed from the iPhone is providedon big screen. The TV APP and the MOBILE APP may be configured as asingle application (e.g., distributed as a single application), or maybe provided as separate applications.

In one or more implementations, each of a plurality of participantsoperating, for example, user computing device 104 participate in aninteractive video conferencing at least in part by establishing adata/communication session with the data processing apparatus 102. Aform of a star topology is established, in which data processingapparatus 102 is communicatively connected to each of a plurality ofrespective user computing devices 104 and respectfully receivesaudio/video feed from each device, such as provided as a function ofinput from a respective camera and/or microphone.

Thus, in one or more implementations, the present disclosure canimplement a star topology in which a central node (e.g., a dataprocessing apparatus 102) receives low resolution of video content fromeach of a plurality of computing devices (e.g., client devices 104). Thecentral node can be configured by executing program instructions tocompose a single video comprising all of the video received from thevarious devices. The single video can be provided substantially inreal-time as one high-definition (“HD”) video. The central node can sendthe HD video to all of the computing devices operated by the varioususers, as well as to the device operated by the “presenter.”

Continuing with the respective one or more implementations describedabove, each of the respective individual feeds from the respectivedevices is received by the data processing apparatus 102 and the videofeeds (including, for example, images) are composed into a single videostream. The video stream can be configured as a high definition stream(e.g., 1280×720 or higher resolution), and output to each of at leastsome of the respective user computing devices 104.

FIG. 3 is an example high-level diagram that illustrates interactivitybetween various ones of the devices illustrated in FIG. 1 and identifiesexample communication protocols in one or more implementations of thepresent disclosure. The implementation illustrated in FIG. 3 is usableas a consumer (e.g., a residential) implementation, as well as anenterprise implementation. As illustrated in FIG. 3, WebRTC is shownwith regard to communications between user computing devices 104 (shownas a CHROME BOOK and mobile computing device, e.g., a smart phone) andsupporting browser-to-browser applications and P2P functionality. Inaddition, RTSP is utilized in connection with user computing devices 104and Internet media extender 110, thereby enabling presentation ofaudio/video content from devices 104 on television 112.

In one or more implementations, HTTP Live Streaming (“HLS”) is utilizedfor HTTP-based media streaming. In addition or in the alternative,adaptive bit rate HLS is utilized, thereby enabling a portion of thestream is available in a plurality of encoding sizes and resolutions foreffective receipt regardless of device or bandwidth. As known in theart, HLS is a usable to parse a stream into a sequence of smallHTTP-based file downloads, each download comprising a portion of thestream. As the stream plays, a client device can select from a number ofdifferent alternate streams containing the same material encoded at avariety of data rates, allowing the streaming session to adapt to anavailable data rate. A M3U playlist containing the metadata for thevarious sub-streams which are available for download is also providedand downloaded.

In the example illustrated in FIG. 3, a respective computing device 104is illustrated as the origin or host (also referred to herein,generally, as a “presenter” device) that is executing the MOBILE APP anddefining a session for respective other devices to use for, for example,interactive video conferencing. In one or more implementations, theorigin or host device 104 establishes an initial session and options areprovided to invite other users (e.g., user computing devices 104 thatare configured with the MOBILE APP) to join the conferencing session.Users who are invited to join the session can further invite otherusers, for example, when permission for extending invitations to otherusers has been made available (e.g., enabled) by the origin or hostdevice. Accordingly, the MOBILE APP can include instructions forenabling a processor associated with each respective user computingdevice 104 to generate and transmit invitations for users to join arespective conferencing session. In accordance with the presentdisclosure, video sharing, video conferencing, sharing of multimediacontent, data, documents and various files is supported, as shown anddescribed in greater detail herein.

In one or more implementations, a plurality of interactive communicationsessions can be defined by an origin or host device. Each of therespective sessions can be defined and identified using a specific titleor label. For example, “#APPDEV” can be used to define and identify aninteractive communication session having a topic dedicated to softwareapplication development. The origin or host device can, thereafter,transmit invitations to computing devices 104 associated with softwareengineers and other relevant parties to join the #APPDEV session.Videoconferencing technology in accordance with the present disclosureis, thereafter, available for the software engineers and other relevantparties and the user of the origin or host device 104, such as to conferover topics associated with software application development. Similarly,the same respective origin or host device 104 can define anotherrespective session with a different topical focus, e.g., sales andentitled #SALES. Invitations to the #SALES session can be transmitted bythe origin or host device to computing devices 104 associated withindividuals in a sales and marketing department. Videoconferencingtechnology in accordance with the present disclosure is, thereafter,available for those one or more individuals to confer about topicsassociated with sales. In one or more implementations, at least part ofthe respective groups of users is mutually exclusive, in that members ofthe “#APPDEV” session cannot participate in the #SALES session, and atleast some of the members of the #SALES session cannot participate inthe #APPDEV session.

In operation, and in accordance with one or more implementations, afteran invitation is sent to a user of a computing device 104 for joining asession defined by a respective topic (e.g., #TOPIC), the useraffirmatively accepts the invitation and is, thereafter, authorized tojoin the session. Thereafter, the user can select, via the MOBILE APP,an identifier representing the respect session (e.g., #TOPIC) provided,which causes the user computing device 104 to execute one or moreinstructions that enable the device 104 to connect to and/or join thesession, and access materials associated therewith. Moreover, in one ormore implementations rules can be defined and/or enforced that restrictaccess to sessions and/or content to respective users. For example, asession defined as #TOPIC may be shared by seven users, however rulescan be defined by a user of the origin or host computing device 104 andimplemented that restricts all but three of the users from real-timevideo conferencing via the #TOPIC session. Content associated with the#TOPIC session, however, can be made available to all seven of theusers. Materials associated with a respective session can be stored(e.g., backed up) remotely, e.g., in the “cloud” and be available foraccess, archived and/or made available for users in the future. Suchcontrol can, be restricted from future access, as well.

It will be appreciated by one of ordinary skill that countless sessionscan be defined for topics of all sorts and usable for respective groupsof users. Demarcating individual sessions in accordance with the presentdisclosure is usable in the consumer and enterprise markets alike, andsolves a long felt need associated with controlled communicationsessions including interactive video among respective individuals andparties.

In certain implementations, a plurality of parties can connect to arespective videoconferencing session in accordance with the presentdisclosure. In such instances, a client/server model can be effectivefor handling the respective devices and management of data andcommunications thereof. In other certain implementations, one or a fewindividuals connect to a respective videoconferencing session inaccordance with the present disclosure. In such cases, a P2P frameworkcan be effective in an implementation.

FIGS. 4-15 illustrate example implementations of the present disclosure.FIG. 4 illustrates an example user speaking and being captured in imagesby a respective device, e.g., user computing device 104. In the exampleshown in FIG. 4, the user is located in a room in front of severalwindows, a lamp, and tables. The user is remote from a physicalconference room where other participants are physically located, and theuser is participating virtually via an online video conference inaccordance with the teachings herein.

FIGS. 5 and 6 illustrate the physical conference room that is remotelylocated from the user, and a user computing device 104 operated by auser who is physically located in the conference room. An augmentedreality version of the conference room is displayed in the user computerdevice 104, and the remotely located user is shown as if physicallyseated at the table in the conference room. More particularly, elementsthat were captured in FIG. 4, such as the background elements (windows,lamp, tables, etc.) have been removed and the user has been extractedtherefrom. This provides a more realistic view of the user as if beingphysically present and located in the conference room.

FIGS. 7-9 show the user appearing seated at the table, and demonstratesthe versatility of the present disclosure. In the views shown in FIGS.7-9, the distance and angle of view changes as the user who isphysically located in the conference room moves about while continuingto capture video content of the conference room. Even though the angleof view and distance changes, the position of the remotely located useris virtually represented consistently. This can be accomplished as afunction of one or more instructions being executed by one or moreprocessors to provide augmented reality and altered video content as afunction of a determination of the location of the respective seat inwhich the remotely located user appears to be sitting. Representing theremotely located user in this way provides for substantially greaterrealism, even as the user computing device 104 operated by the user whois physically located in the conference room moves about the location.

For example, a person viewing an augmented reality of a virtually placedparticipant in a conference room moves about. During that movement,images of the virtually placed participant adjusts, such as with regardto an appropriate skew and angle. In this way, the virtually placedparticipant appears as (s)he would as if the participant was physicallylocated in the conference room. This transformative action can occur foranything seen in the virtual conference, including objects. In one ormore implementations, similar transformations can occur with audiocontent, such as to adjust volume, distortion, echo (or reverb), or thelike, to virtually represent an audio experience for a user.

FIGS. 10-13 illustrate an example implementation in which an additionaltwo remotely located persons are shown virtually sitting next to eachother at the conference room table shown in FIGS. 5-9, as a function ofaugmented reality.

FIG. 14 illustrates an example implementation in which the additionaltwo remotely located persons shown in FIGS. 10-13 are shown virtuallysitting at a conference room table, while other persons who arephysically present in the conference room appear with them. Thisdemonstrates an example of the present disclosure integrating people whoare physically located in a conference with those who are locatedremotely. As can be seen, the respective users appear as they would asif physically present, naturally and realistically.

FIG. 15 illustrates an example implementation of the present disclosurethat includes virtual projector 1502 that appears to projectrepresentations of the remotely located persons of FIGS. 12 and 13.FIGS. 16A-16B illustrate an example implementation of the presentdisclosure. FIG. 16A includes the virtually provided projector 1602 thatis usable to illustrate projected representations of a remotely locatedperson. As shown in FIG. 16A, a computing device provides the augmentedreality version of the conference room with a remotely locatedparticipant represented vis-à-vis the virtual projection device 1602.FIG. 16B illustrates the conference room in its actual, physical form.In FIG. 16B, no computing device is shown providing the augmentedreality version, hence the projection device 1602 and participant arenot shown as those were virtually presented as a function of augmentedreality.

FIGS. 17A-17E illustrate an example implementation of the presentdisclosure and include a whiteboard. As shown in FIG. 17A, a physicalwhiteboard 1702 that is located in a conference room is shown, as is avirtual representation of the whiteboard 1704, which is provided as afunction of augmented reality. The whiteboard 1704 is provided to allparticipants (whether remotely or physically located), as function ofany respective computing device that is configured to provide anaugmented reality, virtual representation. Any of the participants caninteract via the virtual whiteboard 1704, including to draw on theirrespective computing devices, and the virtual whiteboard 1704 isprovided to each of the members in the updated state. For example, FIG.17B illustrates the virtual whiteboard 1704 in the same state as thephysical whiteboard 1702 because no participant has updated the virtualwhiteboard 1704, such as by drawing on it physically or virtually. FIG.17C illustrates the virtual whiteboard 1704 in an updated state.Markings 1706 are made by a participant (whether or not locatedremotely), using the participant's respective computing device showingthe whiteboard 1704. FIG. 17D illustrates the whiteboard 1704 in afurther updated state with additional markings 1706, which can be madeby the one participant or by one or more of a plurality of participantsusing respective computing device(s). Regardless, the physicalwhiteboard 1702 remains unchanged, as shown in FIG. 17E. In this way,virtual collaboration is provided as a function of augmented reality andin connection with physically located objects, such as whiteboard 1702.

In addition, physical markings, such as on whiteboard 1702, can beintegrated with a virtual whiteboard (e.g., whiteboard 1704). Forexample, based on the geometry of the board, an image of the whiteboard1702 can be adjusted (e.g., distorted) to appear as a flat image, andthen superimposed essentially as a background image on the existingcollaborative virtual whiteboard 1704. The superimposed image of thewhiteboard 1702 can be further processed, such as to ensure that thebackground is truly white, as opposed for example to a shade of grey, toensure proper blending into the virtual whiteboard 1704. Blendingprocesses can be used to eliminate the white portion of the image, withjust the (e.g., black) writing being extract and then added to thevirtual board 1704. Blending processes are usable, for example, toremove backgrounds and to extract just markings (e.g., writings anddrawings) from a whiteboard, and to provide the extracted markingsvirtually in whiteboards that are digitally represented. For example, asa user physically writes on a whiteboard, a stream of video content(e.g., via a camera) is generated. The background of the video content(e.g., the blank whiteboard portion) is removed and the extractedwriting is transmitted to a computing device to be used in an augmentedreality version of a whiteboard. In this way, two physical whiteboardscan be kept synchronized, as markings from one whiteboard can bedisplayed virtually on another whiteboard, and vice-versa.

In one or more implementations, a virtual control panel can be providedfor users of user computing devices to define settings forwriting/drawing on a virtual whiteboard. For example, a control panelcan be provided for a user to select a line width, transparency, color,or other features. Once selected, the user can write/draw on a computingdevice, such as in a location on the device's display wherein awhiteboard is provided. Once the location of a virtual whiteboard in ascene is determined, such as described herein, the user's marking of thewhiteboard can be represented as the user writes/draws with a finger,stylus, or other suitable tool in that respective location. Theappearance of the writing/drawing can be defined as a function ofselections made in the respective control panel, such as in connectionwith the line width, color, transparency, or the like.

In one or more implementations, machine learning is usable to detectvarious objects in video content, such as a whiteboard, chairs, tables,lamps, and people. Once detected, the respective planes of therespective objects can be defined for manipulating the objects invirtual environments. Alternatively, or in addition, a graphical userinterface operating on a user computing device can be provided for auser to select or define (via boxes, circles, or other predefined orfree-form selection shapes) the respective objects. Thereafter, theobjects and their respective planes can be identified and defined by aprocessor executing one or more software instructions, and the objectscan be manipulated, virtually, including as shown and described herein.

FIG. 18A illustrates a physical display device 1802, which isrepresented virtually on a participant's computing device as virtualdisplay device 1804. While virtual display device 1804 is representedsubstantially in the same state as physical display device 1802 in FIG.18A, it is possible for any respective participant to alter the contentdisplayed on virtual display device 1804. For example, FIG. 18Billustrates virtual display device 1804 in a first state (which issubstantially the same as physical display device 1802). FIG. 18Cillustrates virtual display device 1804 showing completely differentcontent, which can be a respective participant's computing device orother display. In this way, content on a respective display device canbe altered substantially in real time and virtually as a function ofaugmented reality.

With reference now to one or more implementations, FIG. 19A illustratesan example implementation of the present disclosure configured toproject composite representations of the remotely located persons. Inone arrangement, the composite representation provided is that of theparticipants, including remotely located persons depicted in FIGS. 13and 14. For example, each of the composite representations includes auser representation portion 1904 and a generated portion 1902. The userrepresentation portion 1904 includes, in one or more implementations,the representation of the remote user as described and depicted in FIGS.9, 13 and 14. The user representation portion 1904 matches the user'smovements and/or orientation that occur, such as in the remote location.For example, when the remote user gestures or moves his/her head, theuser representation portion 1904 provides a corresponding depiction.

In one or more configurations, the user representation portion 1904 onlydepicts a portion of the remote user. For example, the userrepresentation portion 1904 depicts the body of the remote user, but notthe head of the user.

As further shown in FIG. 19A, the generated portion 1902 is anartificial, synthetic or computer-generated image or representation thatis displayed along with the user representation portion 1904. In onearrangement, the generated portion 1902 is an avatar, image, computergenerated image or animation, icon or other similar type of visualconstruct. In one arrangement, the generated portion 1902 can beanimated or altered based on data received from one or more sources.

In a further arrangement shown in FIG. 19A, multiple generatedportion(s) 1902 are depicted such that multiple remote users are eachprovided with an individual generated portion 1902 associated with thecorresponding user representation portion 1904. Alternatively, not everyuser representation portion 1904 need be associated with a generatedportion. In one arrangement, each generated portion 1902 is the same. Inan alternative arrangement, different generated portions 1902 can beassigned to different remote users. Here, each generated portion 1902can be automatically assigned to a user representation portion 1904based on a roster of available generated portions 1902. Alternatively,the user can provide a desired generated portion 1902 to be used in thecorresponding depiction of the user representation 1904.

As shown in FIG. 19A, the generated portion 1902 is presented as anoverlay of a portion of the user representation portion 1904. In onearrangement the generated portion overlays the entire userrepresentation 1904. In the depicted configuration, the generatedportion 1902 is positioned so as to obscure the head of the user asdepicted in the user representation portion 1904. The generated portion1904 is moved or altered so as to track the position of the user's headso that the user's head is not visible during the compositerepresentation.

As shown in FIG. 19B, where the user representation portion 1904 doesnot include a complete representation of the user, such as the userrepresentation portion is only a depiction of the user from the neckdown. Here the generated portion 1902 does not overlay the userrepresentation portion 1904, instead, the generated portion 1902 isaligned with a relevant user marker or location in the userrepresentation portion 1904. For instance, when the generated portion1902 is depicted as an animated face and head, the generated portion1902 is aligned with the neck of the user depicted in the userrepresentation portion 1904.

As shown in FIGS. 19A-B, a projection device, such as projection device1502 may be used to generate and/or project representations of theremotely located persons.

As shown in FIG. 19C, where the generated portion 1902 is acomputer-generated head and face, the facial arrangements can be alteredto mimic facial arrangements of the remote user. By way of non-limitingexample, the state of the user's face, such as smiling, frowning,closing one's eyes, or speaking, can be mimicked by the generatedportion 1902. For example, the animation frame progression depicted inkey frames 1903-1907 tracks the facial arrangement and/or movements ofthe remote user. Here, the shape, size or orientation of the remoteuser's facial features are mimicked by the generated portion 1902.

In yet a further depiction, as shown in FIG. 19D, the generated portion1902 tracks the relative position, size and orientation of some or allof the user representation portion 1904. By way of non-limiting example,the generated portion 1902 is able to maintain proper scaling of thedepicted avatar based on the proximity of the user to a viewer for adisplay device. For instance, where a user moves closer to the viewer,the generated portion 1902 increases, in proportion, based on thedistance depicted between the user and the viewer of the display device.For example, the animation frame progression depicted in key frames1909-1913 provides that the relative size of the generated portion 1902increases or decreases relative to the viewing frame, depending on theproximity of the user to the viewer of the display device. Asillustrated in the animation frames, as a user moves towards the viewerof the display device, the apparent size of the generated portion 1902will increase dynamically so as to match the proportions of the userrepresentation.

FIG. 20 is a flowchart illustrating steps associated with animplementation of the present disclosure, in which an augmented realityvirtual videoconference for each of a plurality of computing devicesduring a networked communication session is provided. At step S102, thenetworked communication session is defined and provided a plurality ofdevices. At step S104, video content that is at least partially capturedby a camera associated with a respective device is received. At stepS106, a composited interactive audio/video feed comprised of audio/videoinput received during the networked communication session from each ofthe first user computing device and at least the respective usercomputing device of the one of the additional users is generated. Atleast some of the video content captured by the camera associated with arespective user computing device is removed prior to including theremaining video content in the composited interactive audio/video feed.At step S108, the composited interactive audio/video feed is provided tothe plurality of computing devices during the networked communicationsession.

Thus, as shown and described herein, technological solutions areprovided to the problem of providing real presence over virtually anydistance.

Although many of the examples shown and described herein regardproviding augmented reality in a videoconferencing environment, theinvention is not so limited. Moreover, although illustrated embodimentsof the present disclosure have been shown and described, it should beunderstood that various changes, substitutions, and alterations can bemade by one of ordinary skill in the art without departing from thescope of the present disclosure.

What is claimed is:
 1. A system for providing an augmented realityvirtual videoconference for each of a plurality of computing devicesduring a networked communication session, the system comprising:non-transitory processor readable media; at least one processoroperatively coupled to the non-transitory processor readable media,wherein the non-transitory processor readable media have instructionsthat, when executed by the at least one processor, causes the at leastone processor to perform the following steps: receive initializationinformation from a computing device, wherein the initializationinformation includes video content of a room captured by a cameraconfigured with the computing device; process the initializationinformation to detect objects in the room, wherein at least one of thedetected objects is a whiteboard; provide respective access to anetworked communication session to each of a plurality of computingdevices, wherein at least one of the computing devices is respectivelyoperated by a user physically located out of the room and at least oneof the respective computing devices is respectively operated by a userphysically located in the room; receive, during the networkedcommunication session from at least one computing device respectivelyoperated by a user physically located in the room, video content;generate a composited interactive video feed comprised of the videocontent received from the at least one computing device respectivelyoperated by a user physically located in the room and video contentreceived during the networked communication session from each of atleast one computing device respectively operated by a user physicallylocated out of the room; revise, during the networked communicationsession, the composited interactive video feed by replacing at leastpart of an appearance of at least one of the detected objects with atleast some video content received during the networked communicationsession from one of the computing devices respectively operated by auser physically located out of the room and further revise, by the atleast one processor, during the networked communication session, thecomposited interactive video feed by altering an appearance of thewhiteboard; and provide the revised composited interactive video feed tothe plurality of computing devices during the networked communicationsession.
 2. The system of claim 1, wherein the non-transitory processorreadable media have further instructions that, when executed by the atleast one processor, causes the at least one processor to: process theinitialization information to detect corresponding informationassociated with the detected objects.
 3. The system of claim 2, whereinthe detected objects and corresponding information includes at least oneplane of the object.
 4. The system of claim 2, wherein thenon-transitory processor readable media have further instructions that,when executed by the at least one processor, causes the at least oneprocessor to: use machine learning to process the initializationinformation.
 5. The system of claim 4, wherein the machine learning canbe implemented for at least image processing.
 6. The system of claim 1,wherein the non-transitory processor readable media have furtherinstructions that, when executed by the at least one processor, causesthe at least one processor to: provide an augmented reality view of theremaining video content in the composited interactive audio/video feedas a function of movement of a viewer of the composited interactiveaudio/video feed.
 7. The system of claim 6, wherein the augmentedreality view includes adjusting for skew and angle of view.
 8. Thesystem of claim 1, wherein at least one of the additional user computingdevice(s) communicate on the networked communication session via one ormore of Real Time Streaming Protocol, Web Real-Time Communication and/orhypertext transport protocol live streaming.
 9. The system of claim 1,wherein the non-transitory processor readable media have furtherinstructions that, when executed by the at least one processor, causesthe at least one processor to: receive from the additional usercomputing devices information representing an interaction by the one ofthe user computing devices; and provide a representation of theinteraction to each other user computing device.
 10. A method forproviding an augmented reality virtual videoconference for each of aplurality of computing devices during a networked communication session,the method comprising: receiving, by at least one processor,initialization information from a computing device, wherein theinitialization information includes video content of a room captured bya camera configured with the computing device; processing, by the atleast one processor, the initialization information to detect objects inthe room, wherein at least one of the detected objects is a whiteboard;providing, by the at least one processor, respective access to anetworked communication session to each of a plurality of computingdevices, wherein at least one of the computing devices is respectivelyoperated by a user physically located out of the room and at least oneof the respective computing devices is respectively operated by a userphysically located in the room; receiving, by the at least one processorduring the networked communication session from at least one computingdevice respectively operated by a user physically located in the room,video content; generating, by the at least one processor, a compositedinteractive video feed comprised of the video content received from theat least one computing device respectively operated by a user physicallylocated in the room and video content received during the networkedcommunication session from each of at least one computing devicerespectively operated by a user physically located out of the room;revising, by the at least one processor during the networkedcommunication session, the composited interactive video feed byreplacing at least part of an appearance of at least one of the detectedobjects with at least some video content received during the networkedcommunication session from one of the computing devices respectivelyoperated by a user physically located out of the room and furtherrevising, by the at least one processor, during the networkedcommunication session, the composited interactive video feed by alteringan appearance of the whiteboard; and providing, by the at least oneprocessor, the revised composited interactive video feed to theplurality of computing devices during the networked communicationsession.
 11. The method of claim 10, further comprising: processing, bythe at least one processor, the initialization information to detectcorresponding information associated with the detected objects.
 12. Themethod of claim 11, wherein the detected objects and correspondinginformation includes at least one plane of the object.
 13. The method ofclaim 11, further comprising: using, by the at least one processor,machine learning to process the initialization information.
 14. Themethod of claim 13, wherein the machine learning can be implemented forat least image processing.
 15. The method of claim 10, furthercomprising: providing, by the at least one processor, an augmentedreality view of the remaining video content in the compositedinteractive audio/video feed as a function of movement of a viewer ofthe composited interactive audio/video feed.
 16. The method of claim 15,wherein the augmented reality view includes adjusting for skew and angleof view.
 17. The method of claim 10, wherein at least one of theadditional user computing device(s) communicate on the networkedcommunication session via one or more of Real Time Streaming Protocol,Web Real-Time Communication and/or hypertext transport protocol livestreaming.
 18. The method of claim 10, further comprising: receiving, bythe at least one processor, from the additional user computing devicesinformation representing an interaction by the one of the user computingdevices; and providing a representation of the interaction to each otheruser computing device.